---
title: Brief Intro to Algorithms pt.1
description: >
  Notes on basic algorithm theory.
categories: posts
tags: [algorithms, ruby]
---

<details markdown="1" id="table-of-contents">
<summary>
Table of Contents
</summary>

* TOC
{:toc}
</details>

## Definition

In general, an algorithm is a step by step procedure which defines a set of
instructions to be executed in certain order to get the desired output.

As we'll see algorithms are closely related to data structures. Some important
categories of algorithms, from a data structure point of view, are:

- Search.
- Sort.
- Insert.
- Update.
- Delete.

But more on that later.

## Characteristics of an Algorithm

Before we mentioned that an algorithm is a step by step procedure. This doesn't
mean that all procedures can be call algorithms. An algorithm should be:

- Correct. It should clearly define inputs (0+) and outputs (1+).
- Understandable. It should be clear and unambiguous.
- Efficient. Hence, it's feasible, finite, and independent.

Algorithms should only be designed after the problem domain has been well-defined.
Since they are generally created independent of underlying languages. We rely
heavily on pseudo code.

## Algorithm Analysis

Efficiency is one of the main concerns of an algorithm. It can be analyzed at two
different stages; before and after implementation.

- _A priori_ analysis is the theoretical part of analyzing an algorithm. Efficiency
is measured by assuming all other factors have no effect on implementation.

- _A posteriori_ analysis. This is empirical analysis. That is, the selected
algorithm is implemented using a programming language. It is then executed and
further analyzed upon its actual statistics; running time, space required, etc.

A way of figuring out how complex an algorithm is by analyzing it relying on
complexity theory.

## Complexity Theory

Complexity theory is the study of how complicated algorithms are in terms of its
efficiency. Which can be measured by the time it takes, the memory space it
requires, and so on.

In order to improve the efficiency of an algorithm we need to look at its best,
average, and worst case behaviors. To be on the safe side, people often
concentrate on improving an algorithm's worst case behavior.

To make a proper analysis of an algorithm we must consider how different
circumstances affect its performance. The simplest situation is when an algorithm
gives the same performance in every case regardless of the data we pass it.
However, the performance of some algorithms depends on the data you use. Some
algorithms do well with some types of data but terribly with others.

### Why do we care?

While is possible to solve a problem without considering the complexity of its
algorithms we might end up wasting time if the implemented solution wastes a lot
of resources.

With Complexity Theory we can:

- Predict an algorithm's behavior and avoid those that might not work.
- Compare the performance of different algorithms.
- Proof some of the characteristics of an algorithm.

## Big O Notation

We can use Big O notation to study the performance of an algorithm. Normally,
Big O looks at the upper bound of an algorithm's performance, that is, its worst
case behavior, which, like we mentioned before, is the safest way to look at an
algorithm's performance.

If an algorithm has a good Big O performance then we don't need to worry on it
blowing up on us when implemented in our system.

Big O notation also looks at an algorithm's asymptotic behavior, that is, how
it performs as the size of the problem grows to large. Hence we can simply ask
"What happens to an algorithm's performance as `n` grows very large?"

To figure out an algorithm's Big O notation we can follow these 5 rules:

- If an algorithm performs `f(N)` steps, then it is `O(f(N))`.

```ruby
max = value[1]

for i = 1 to N
  if value[i] > max
    max = value[i]
  end
end
```

Since the loop will be executed `N` number of times we can safely say that its
performance is `O(N)`. In other words, this algorithm has an order `N` behavior.

- If an algorithm performs `f(N)` steps follow by `g(N)`
steps, then it is `O(f(N) + g(N))`.

  Lets consider the next algorithm for finding the maximum and minimum values
  inside an array

```ruby
max = value[1]

for i = 1 to N
  if values[i] > max
    max = value[i]
  end

  next i
end

min = value[1]

for i = 1 to N
  if value[i] < min
    min = value[i]
  end

  next i
end
```

First it performs `N` steps to calculate the maximum value. Then it performs
other `N` steps for the minimum value. So its behavior is `O(N + N)` or
`O(2N)`

- If `f(N) > g(N)` for a large `N`, then `O(f(N) + g(N)) = O(f(N))`.

Basically this says that with a large `N` we can round up. Since `f(N)` is the
largest, it will eventually dominate the runtime to the point that is safe to
ignore `g(N)`.

Considering the previous point, we know that if an algorithm performs <code>O(N<sup>2</sup>)</code>
steps followed by `O(N)` steps then its actually <code>O(N<sup>2</sup> + N)</code>. Let's see how as `N`
grows `O(N)` has less weight on the over all calculation.

|N | N<sup>2</sup> | N<sup>2</sup> + N | %N |
|---|---|---|---|
| 10 | 100 | 110 | 10% |
| 100 | 100,000 | 10,100 | 1% |
| 1,000 | 1,000,000 | 1,001,000 | 0.1% |
| 10,000 | 100,000,000 | 100,010,000 | 0.01% |
| 100,000 | 10,000,000,000 | 10,000,100,000 | 0.001% |


This works because as `N` increases the extra `N` steps in `g(N)` contribute a
small percentage to the total steps. So, the larger `N` is the less `g(N)`
affects the algorithm's performance.

This rule implicitly tells us that you can ignore constants added to a function.
`O(C + f(N)) = O(f(N))` works because `C` is also a function, the kind whose
value never changes.

- If an algorithm performs `g(N)` steps for each of `f(N)` steps, then it is
`O(f(N) · g(N))`.

In the following algorithm we return true for each value repeated in an array.

```ruby
for i = 1 to N

  for j = 1 to N
    true if i != j && value[i] == value[j]
    next j
  end

  next i
end
```

Since it runs `N` times for each `i` we can say that its `O(N · N)` or
<code>O(N<sup>2</sup>)</code>.

- Ignore constant multiples. `O(C · f(N)) = O(f(N))` and `O(f(C · N)) = O(f(N))`
This shows us that even when we know that the second algorithm will roughly take
`C` times more time to executed we can't really compared the efficiency of this
two algorithms. However, it allows us to compare them to other algorithms with
different Big O notations.

### O(1)

Remember that we can safely ignore constants when we are working with Big O
notation. So `O(1)` applies to any algorithm that performs a fix number of steps.

### O(N)

This kind of algorithm, typically performs one action per input.

### O(N<sup>C</sup>)

This includes algorithms with nested loops.

#### N<sup>C</sup>

`N` can take any possible values, such as <code>N<sup>1/2</sup></code> and <code>N<sup>1.5</sup></code>. In general,
for powers of `N`, the bigger the power the faster the function grows as `N`
increases.

### O(Log N)

A common example of this are algorithms that divide a search space into two
pieces at each step. As far as Big O notation is concern, all log bases are the
same.
We can use this formula <code>Log<sub>B</sub>(N) = Log<sub>A</sub>(N)/Log<sub>A</sub>(B)</code> to convert from base `A`
to base `B`. Where <code>Log<sub>A</sub>(B)</code> is just a constant. Since Big O notation ignores
constant multiples we can add whatever conversion factor we like to change log
bases.

### O(2<sup>N</sup>)

These algorithms are typically used to make selections from groups of objects.

### O(N!)

Algorithms with factorial behavior usually involve arrangements of items because
there are `N` factorial ways you can arrange `N` items.

## Comparing Functions

This graph shows that the slower a function grows the faster it is. So typically
we rather work with `O(Log N)` functions.

Now, suppose we have a `f(1,000)`. This table shows how long it would take
algorithms with different runtimes to run in a computer that can run one million
steps per second.

| Runtime | Steps Executed | Time |
|---|---|---|
| `O(Log N)` | 10 | 0.00001 sec |
| <code>O(N<sup>1/2</sup>)</code> | 32 | 0.00003 sec |
| `N` | 1,000 | 0.001 sec |
| <code>N<sup>2</sup></code> | 1,000,000 | 1 sec |
| <code>2<sup>N</sup></code> | 1.07x10^301 | 3.40x10^287 years |
| `N!` | 4.02x10^2567 | 1.28x10^2554 years |

Notice that <code>N<sup>2</sup></code> with an `f(1,000,000)` would take about 16 minutes to finish.
We could wait that long if we are solving an important problem. Yet, its not fast
enough for an interactive program.

As we can see in the table, the last two runtimes are already too slow for a
`f(1,000)`, hence they are only useful for really small `N`s.

Finally, lets see how large problem the different algorithms could solve in one
second on a computer that can execute one million steps per second.

| Runtime | N |
|---|---|
| `O(Log N)` | >1x10^300,000 |
| <code>O(N<sup>1/2</sup>)</code> | 1 trillion |
| `N` | 1 million |
| <code>N<sup>2</sup></code> | 1 thousand |
| <code>2<sup>N</sup></code> | 20 |
| `N!` | 10 |

## Final Remarks

In this first part of Brief Intro to Algorithms we covered some basic terminology.
We also talked about Big O notation and its importance.
